Efficient exploration with Double Uncertain Value Networks

نویسندگان

  • Thomas M. Moerland
  • Joost Broekens
  • Catholijn M. Jonker
چکیده

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SEISMIC DESIGN OF DOUBLE LAYER GRIDS BY NEURAL NETWORKS

The main contribution of the present paper is to train efficient neural networks for seismic design of double layer grids subject to multiple-earthquake loading. As the seismic analysis and design of such large scale structures require high computational efforts, employing neural network techniques substantially decreases the computational burden. Square-on-square double layer grids with the va...

متن کامل

A Novel Hybrid Modified Binary Particle Swarm Optimization Algorithm for the Uncertain p-Median Location Problem

Here, we investigate the classical p-median location problem on a network in which the vertex weights and the distances between vertices are uncertain. We propose a programming model for the uncertain p-median location problem with tail value at risk objective. Then, we show that it is NP-hard. Therefore, a novel hybrid modified binary particle swarm optimization algorithm is presented to obtai...

متن کامل

FINITE-TIME PASSIVITY OF DISCRETE-TIME T-S FUZZY NEURAL NETWORKS WITH TIME-VARYING DELAYS

This paper focuses on the problem of finite-time boundedness and finite-time passivity of discrete-time T-S fuzzy neural networks with time-varying delays. A suitable Lyapunov--Krasovskii functional(LKF) is established to derive sufficient condition for finite-time passivity of discrete-time T-S fuzzy neural networks. The dynamical system is transformed into a T-S fuzzy model with uncertain par...

متن کامل

Two Comprehensive Strategies to Prioritize the Capacity Improvement Solutions in Railway Networks (Case Study: Iran)

The aim of this study is to present two comprehensive strategies for prioritizing the capacity improvement solutions in the railway networks. The solutions considered in this study include: promoting to double-track railways, block signaling system, electrification and re-opening the closed stations. The first strategy is based on a local approach, which concentrates on the critical block secti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.10789  شماره 

صفحات  -

تاریخ انتشار 2017